{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Objective : Loading Data into DataFrames\n",
    "\n",
    "<hr>\n",
    "\n",
    "1. Sources from which dataframes can be created\n",
    "2. Loading from CSV\n",
    "3. Loading from JSON - Structured & Unstructured\n",
    "4. Loading from Excel\n",
    "5. Creating pickled data & Loading from Pickled Data\n",
    "6. Loading from Database\n",
    "\n",
    "<hr>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1. Sources from which dataframes can be created\n",
    "* Reading data from different sources, here is the list.\n",
    "* Also, includes writing data to different sources.\n",
    "\n",
    "<img src=\"images/IO.png\">"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2. Reading CSV"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "rental_data = pd.read_csv('../Data/house_rental_data.csv.txt')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 645 entries, 0 to 644\n",
      "Data columns (total 8 columns):\n",
      "Unnamed: 0     645 non-null int64\n",
      "Sqft           645 non-null float64\n",
      "Floor          645 non-null int64\n",
      "TotalFloor     645 non-null int64\n",
      "Bedroom        645 non-null int64\n",
      "Living.Room    645 non-null int64\n",
      "Bathroom       645 non-null int64\n",
      "Price          645 non-null int64\n",
      "dtypes: float64(1), int64(7)\n",
      "memory usage: 40.4 KB\n"
     ]
    }
   ],
   "source": [
    "rental_data.info()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Unnamed: 0</th>\n",
       "      <th>Sqft</th>\n",
       "      <th>Floor</th>\n",
       "      <th>TotalFloor</th>\n",
       "      <th>Bedroom</th>\n",
       "      <th>Living.Room</th>\n",
       "      <th>Bathroom</th>\n",
       "      <th>Price</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>1177.698</td>\n",
       "      <td>2</td>\n",
       "      <td>7</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>62000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>2134.800</td>\n",
       "      <td>5</td>\n",
       "      <td>7</td>\n",
       "      <td>4</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>78000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>1138.560</td>\n",
       "      <td>5</td>\n",
       "      <td>7</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>58000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>1458.780</td>\n",
       "      <td>2</td>\n",
       "      <td>7</td>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>45000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>967.776</td>\n",
       "      <td>11</td>\n",
       "      <td>14</td>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>45000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Unnamed: 0      Sqft  Floor  TotalFloor  Bedroom  Living.Room  Bathroom  \\\n",
       "0           1  1177.698      2           7        2            2         2   \n",
       "1           2  2134.800      5           7        4            2         2   \n",
       "2           3  1138.560      5           7        2            2         1   \n",
       "3           4  1458.780      2           7        3            2         2   \n",
       "4           5   967.776     11          14        3            2         2   \n",
       "\n",
       "   Price  \n",
       "0  62000  \n",
       "1  78000  \n",
       "2  58000  \n",
       "3  45000  \n",
       "4  45000  "
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "rental_data.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "rental_data = pd.read_csv('../Data/house_rental_data.csv.txt', index_col = 'Unnamed: 0')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Sqft</th>\n",
       "      <th>Floor</th>\n",
       "      <th>TotalFloor</th>\n",
       "      <th>Bedroom</th>\n",
       "      <th>Living.Room</th>\n",
       "      <th>Bathroom</th>\n",
       "      <th>Price</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1177.698</td>\n",
       "      <td>2</td>\n",
       "      <td>7</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>62000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2134.800</td>\n",
       "      <td>5</td>\n",
       "      <td>7</td>\n",
       "      <td>4</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>78000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1138.560</td>\n",
       "      <td>5</td>\n",
       "      <td>7</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>58000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1458.780</td>\n",
       "      <td>2</td>\n",
       "      <td>7</td>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>45000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>967.776</td>\n",
       "      <td>11</td>\n",
       "      <td>14</td>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>45000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       Sqft  Floor  TotalFloor  Bedroom  Living.Room  Bathroom  Price\n",
       "1  1177.698      2           7        2            2         2  62000\n",
       "2  2134.800      5           7        4            2         2  78000\n",
       "3  1138.560      5           7        2            2         1  58000\n",
       "4  1458.780      2           7        3            2         2  45000\n",
       "5   967.776     11          14        3            2         2  45000"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "rental_data.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "rental_data = pd.read_csv('../Data/house_rental_data.csv.txt', usecols=lambda c: c.startswith('B'))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Bedroom</th>\n",
       "      <th>Bathroom</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>4</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Bedroom  Bathroom\n",
       "0        2         2\n",
       "1        4         2\n",
       "2        2         1\n",
       "3        3         2\n",
       "4        3         2"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "rental_data.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "rental_data = pd.read_csv('../Data/house_rental_data.csv.txt', nrows=10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Unnamed: 0</th>\n",
       "      <th>Sqft</th>\n",
       "      <th>Floor</th>\n",
       "      <th>TotalFloor</th>\n",
       "      <th>Bedroom</th>\n",
       "      <th>Living.Room</th>\n",
       "      <th>Bathroom</th>\n",
       "      <th>Price</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>1177.698</td>\n",
       "      <td>2</td>\n",
       "      <td>7</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>62000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>2134.800</td>\n",
       "      <td>5</td>\n",
       "      <td>7</td>\n",
       "      <td>4</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>78000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>1138.560</td>\n",
       "      <td>5</td>\n",
       "      <td>7</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>58000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>1458.780</td>\n",
       "      <td>2</td>\n",
       "      <td>7</td>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>45000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>967.776</td>\n",
       "      <td>11</td>\n",
       "      <td>14</td>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>45000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Unnamed: 0      Sqft  Floor  TotalFloor  Bedroom  Living.Room  Bathroom  \\\n",
       "0           1  1177.698      2           7        2            2         2   \n",
       "1           2  2134.800      5           7        4            2         2   \n",
       "2           3  1138.560      5           7        2            2         1   \n",
       "3           4  1458.780      2           7        3            2         2   \n",
       "4           5   967.776     11          14        3            2         2   \n",
       "\n",
       "   Price  \n",
       "0  62000  \n",
       "1  78000  \n",
       "2  58000  \n",
       "3  45000  \n",
       "4  45000  "
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "rental_data.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [],
   "source": [
    "rental_data_itr = pd.read_csv('../Data/house_rental_data.csv.txt', chunksize=300)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Unnamed: 0     300\n",
      "Sqft           300\n",
      "Floor          300\n",
      "TotalFloor     300\n",
      "Bedroom        300\n",
      "Living.Room    300\n",
      "Bathroom       300\n",
      "Price          300\n",
      "dtype: int64\n",
      "Unnamed: 0     300\n",
      "Sqft           300\n",
      "Floor          300\n",
      "TotalFloor     300\n",
      "Bedroom        300\n",
      "Living.Room    300\n",
      "Bathroom       300\n",
      "Price          300\n",
      "dtype: int64\n",
      "Unnamed: 0     45\n",
      "Sqft           45\n",
      "Floor          45\n",
      "TotalFloor     45\n",
      "Bedroom        45\n",
      "Living.Room    45\n",
      "Bathroom       45\n",
      "Price          45\n",
      "dtype: int64\n"
     ]
    }
   ],
   "source": [
    "for data in rental_data_itr:\n",
    "    print (data.count())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [],
   "source": [
    "titanic_data = pd.read_csv('../Data/titanic-train.csv.txt', index_col = 'PassengerId', na_values={'Ticket':'PC 17599'})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PassengerId</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Braund, Mr. Owen Harris</td>\n",
       "      <td>male</td>\n",
       "      <td>22.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>A/5 21171</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Cumings, Mrs. John Bradley (Florence Briggs Th...</td>\n",
       "      <td>female</td>\n",
       "      <td>38.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>C85</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>Heikkinen, Miss. Laina</td>\n",
       "      <td>female</td>\n",
       "      <td>26.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>STON/O2. 3101282</td>\n",
       "      <td>7.9250</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Futrelle, Mrs. Jacques Heath (Lily May Peel)</td>\n",
       "      <td>female</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>113803</td>\n",
       "      <td>53.1000</td>\n",
       "      <td>C123</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Allen, Mr. William Henry</td>\n",
       "      <td>male</td>\n",
       "      <td>35.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>373450</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Survived  Pclass  \\\n",
       "PassengerId                     \n",
       "1                   0       3   \n",
       "2                   1       1   \n",
       "3                   1       3   \n",
       "4                   1       1   \n",
       "5                   0       3   \n",
       "\n",
       "                                                          Name     Sex   Age  \\\n",
       "PassengerId                                                                    \n",
       "1                                      Braund, Mr. Owen Harris    male  22.0   \n",
       "2            Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0   \n",
       "3                                       Heikkinen, Miss. Laina  female  26.0   \n",
       "4                 Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0   \n",
       "5                                     Allen, Mr. William Henry    male  35.0   \n",
       "\n",
       "             SibSp  Parch            Ticket     Fare Cabin Embarked  \n",
       "PassengerId                                                          \n",
       "1                1      0         A/5 21171   7.2500   NaN        S  \n",
       "2                1      0               NaN  71.2833   C85        C  \n",
       "3                0      0  STON/O2. 3101282   7.9250   NaN        S  \n",
       "4                1      0            113803  53.1000  C123        S  \n",
       "5                0      0            373450   8.0500   NaN        S  "
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "titanic_data.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Month</th>\n",
       "      <th>Sales</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1-01</td>\n",
       "      <td>266.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1-02</td>\n",
       "      <td>145.9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1-03</td>\n",
       "      <td>183.1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1-04</td>\n",
       "      <td>119.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1-05</td>\n",
       "      <td>180.3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  Month  Sales\n",
       "0  1-01  266.0\n",
       "1  1-02  145.9\n",
       "2  1-03  183.1\n",
       "3  1-04  119.3\n",
       "4  1-05  180.3"
      ]
     },
     "execution_count": 46,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.read_csv('../Data/sales-data.csv').head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [],
   "source": [
    "from datetime import datetime\n",
    "\n",
    "def parser(x):\n",
    "    return datetime.strptime('200'+x, '%Y-%m')\n",
    " \n",
    "data = pd.read_csv('../Data/sales-data.csv', header=0, parse_dates=[0], index_col=0, date_parser=parser)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Sales</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Month</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2001-01-01</th>\n",
       "      <td>266.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2001-02-01</th>\n",
       "      <td>145.9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2001-03-01</th>\n",
       "      <td>183.1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2001-04-01</th>\n",
       "      <td>119.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2001-05-01</th>\n",
       "      <td>180.3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            Sales\n",
       "Month            \n",
       "2001-01-01  266.0\n",
       "2001-02-01  145.9\n",
       "2001-03-01  183.1\n",
       "2001-04-01  119.3\n",
       "2001-05-01  180.3"
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3. Loading from JSON"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>aliceblue</th>\n",
       "      <td>240</td>\n",
       "      <td>248</td>\n",
       "      <td>255</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>antiquewhite</th>\n",
       "      <td>250</td>\n",
       "      <td>235</td>\n",
       "      <td>215</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>aqua</th>\n",
       "      <td>0</td>\n",
       "      <td>255</td>\n",
       "      <td>255</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>aquamarine</th>\n",
       "      <td>127</td>\n",
       "      <td>255</td>\n",
       "      <td>212</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>azure</th>\n",
       "      <td>240</td>\n",
       "      <td>255</td>\n",
       "      <td>255</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                0    1    2  3\n",
       "aliceblue     240  248  255  1\n",
       "antiquewhite  250  235  215  1\n",
       "aqua            0  255  255  1\n",
       "aquamarine    127  255  212  1\n",
       "azure         240  255  255  1"
      ]
     },
     "execution_count": 52,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.read_json('https://raw.githubusercontent.com/corysimmons/colors.json/master/colors.json', orient='records').T.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>programs</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>{'season': '1842-43', 'orchestra': 'New York Philharmonic', 'concerts': [{'Date': '1842-12-07T05:00:00Z', 'eventType': 'Subscription Season', 'Venue': 'Apollo Rooms', 'Location': 'Manhattan, NY', 'Time': '8:00PM'}], 'programID': '3853', 'works': [{'workTitle': 'SYMPHONY NO. 5 IN C MINOR, OP.67', 'conductorName': 'Hill, Ureli Corelli', 'ID': '52446*', 'soloists': [], 'composerName': 'Beethoven,  Ludwig  van'}, {'workTitle': 'OBERON', 'composerName': 'Weber,  Carl  Maria Von', 'conductorName': 'Timm, Henry C.', 'ID': '8834*4', 'soloists': [{'soloistName': 'Otto, Antoinette', 'soloistRoles': 'S', 'soloistInstrument': 'Soprano'}], 'movement': '\"Ozean, du Ungeheuer\" (Ocean, thou mighty monster), Reiza (Scene and Aria), Act II'}, {'workTitle': 'QUINTET, PIANO, D MINOR, OP. 74', 'ID': '3642*', 'soloists': [{'soloistName': 'Scharfenberg, William', 'soloistRoles': 'A', 'soloistInstrument': 'Piano'}, {'soloistName': 'Hill, Ureli Corelli', 'soloistRoles': 'A', 'soloistInstrument': 'Violin'}, {'soloistName': 'Derwort, G. H.', 'soloistRoles': 'A', 'soloistInstrument': 'Viola'}, {'soloistName': 'Boucher, Alfred', 'soloistRoles': 'A', 'soloistInstrument': 'Cello'}, {'soloistName': 'Rosier, F. W.', 'soloistRoles': 'A', 'soloistInstrument': 'Contrabass'}], 'composerName': 'Hummel,  Johann'}, {'interval': 'Intermission', 'ID': '0*', 'soloists': []}, {'workTitle': 'OBERON', 'composerName': 'Weber,  Carl  Maria Von', 'conductorName': 'Etienne, Denis G.', 'ID': '8834*3', 'soloists': [], 'movement': 'Overture'}, {'workTitle': 'ARMIDA', 'composerName': 'Rossini,  Gioachino', 'conductorName': 'Timm, Henry C.', 'ID': '8835*1', 'soloists': [{'soloistName': 'Otto, Antoinette', 'soloistRoles': 'S', 'soloistInstrument': 'Soprano'}, {'soloistName': 'Horn, Charles Edward', 'soloistRoles': 'S', 'soloistInstrument': 'Tenor'}], 'movement': 'Duet'}, {'workTitle': 'FIDELIO, OP. 72', 'composerName': 'Beethoven,  Ludwig  van', 'conductorName': 'Timm, Henry C.', 'ID': '8837*6', 'soloists': [{'soloistName': 'Horn, Charles Edward', 'soloistRoles': 'S', 'soloistInstrument': 'Tenor'}], 'movement': '\"In Des Lebens Fruhlingstagen...O spur ich nicht linde,\" Florestan (aria)'}, {'workTitle': 'ABDUCTION FROM THE SERAGLIO,THE, K.384', 'composerName': 'Mozart,  Wolfgang  Amadeus', 'conductorName': 'Timm, Henry C.', 'ID': '8336*4', 'soloists': [{'soloistName': 'Otto, Antoinette', 'soloistRoles': 'S', 'soloistInstrument': 'Soprano'}], 'movement': '\"Ach Ich liebte,\" Konstanze (aria)'}, {'workTitle': 'OVERTURE NO. 1, D MINOR, OP. 38', 'conductorName': 'Timm, Henry C.', 'ID': '5543*', 'soloists': [], 'composerName': 'Kalliwoda,  Johann  W.'}], 'id': '38e072a7-8fc9-4f9a-8eac-3957905c0002'}</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     programs\n",
       "0  {'season': '1842-43', 'orchestra': 'New York Philharmonic', 'concerts': [{'Date': '1842-12-07T05:00:00Z', 'eventType': 'Subscription Season', 'Venue': 'Apollo Rooms', 'Location': 'Manhattan, NY', 'Time': '8:00PM'}], 'programID': '3853', 'works': [{'workTitle': 'SYMPHONY NO. 5 IN C MINOR, OP.67', 'conductorName': 'Hill, Ureli Corelli', 'ID': '52446*', 'soloists': [], 'composerName': 'Beethoven,  Ludwig  van'}, {'workTitle': 'OBERON', 'composerName': 'Weber,  Carl  Maria Von', 'conductorName': 'Timm, Henry C.', 'ID': '8834*4', 'soloists': [{'soloistName': 'Otto, Antoinette', 'soloistRoles': 'S', 'soloistInstrument': 'Soprano'}], 'movement': '\"Ozean, du Ungeheuer\" (Ocean, thou mighty monster), Reiza (Scene and Aria), Act II'}, {'workTitle': 'QUINTET, PIANO, D MINOR, OP. 74', 'ID': '3642*', 'soloists': [{'soloistName': 'Scharfenberg, William', 'soloistRoles': 'A', 'soloistInstrument': 'Piano'}, {'soloistName': 'Hill, Ureli Corelli', 'soloistRoles': 'A', 'soloistInstrument': 'Violin'}, {'soloistName': 'Derwort, G. H.', 'soloistRoles': 'A', 'soloistInstrument': 'Viola'}, {'soloistName': 'Boucher, Alfred', 'soloistRoles': 'A', 'soloistInstrument': 'Cello'}, {'soloistName': 'Rosier, F. W.', 'soloistRoles': 'A', 'soloistInstrument': 'Contrabass'}], 'composerName': 'Hummel,  Johann'}, {'interval': 'Intermission', 'ID': '0*', 'soloists': []}, {'workTitle': 'OBERON', 'composerName': 'Weber,  Carl  Maria Von', 'conductorName': 'Etienne, Denis G.', 'ID': '8834*3', 'soloists': [], 'movement': 'Overture'}, {'workTitle': 'ARMIDA', 'composerName': 'Rossini,  Gioachino', 'conductorName': 'Timm, Henry C.', 'ID': '8835*1', 'soloists': [{'soloistName': 'Otto, Antoinette', 'soloistRoles': 'S', 'soloistInstrument': 'Soprano'}, {'soloistName': 'Horn, Charles Edward', 'soloistRoles': 'S', 'soloistInstrument': 'Tenor'}], 'movement': 'Duet'}, {'workTitle': 'FIDELIO, OP. 72', 'composerName': 'Beethoven,  Ludwig  van', 'conductorName': 'Timm, Henry C.', 'ID': '8837*6', 'soloists': [{'soloistName': 'Horn, Charles Edward', 'soloistRoles': 'S', 'soloistInstrument': 'Tenor'}], 'movement': '\"In Des Lebens Fruhlingstagen...O spur ich nicht linde,\" Florestan (aria)'}, {'workTitle': 'ABDUCTION FROM THE SERAGLIO,THE, K.384', 'composerName': 'Mozart,  Wolfgang  Amadeus', 'conductorName': 'Timm, Henry C.', 'ID': '8336*4', 'soloists': [{'soloistName': 'Otto, Antoinette', 'soloistRoles': 'S', 'soloistInstrument': 'Soprano'}], 'movement': '\"Ach Ich liebte,\" Konstanze (aria)'}, {'workTitle': 'OVERTURE NO. 1, D MINOR, OP. 38', 'conductorName': 'Timm, Henry C.', 'ID': '5543*', 'soloists': [], 'composerName': 'Kalliwoda,  Johann  W.'}], 'id': '38e072a7-8fc9-4f9a-8eac-3957905c0002'}"
      ]
     },
     "execution_count": 66,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.set_option('display.max_colwidth', -1)\n",
    "pd.read_json('../Data/raw_nyc_phil.json').head(1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "metadata": {},
   "outputs": [],
   "source": [
    "import json\n",
    "with open('../Data/raw_nyc_phil.json') as f:\n",
    "    d = json.load(f)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 70,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>concerts</th>\n",
       "      <th>id</th>\n",
       "      <th>orchestra</th>\n",
       "      <th>programID</th>\n",
       "      <th>season</th>\n",
       "      <th>works</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>[{'Date': '1842-12-07T05:00:00Z', 'eventType': 'Subscription Season', 'Venue': 'Apollo Rooms', 'Location': 'Manhattan, NY', 'Time': '8:00PM'}]</td>\n",
       "      <td>38e072a7-8fc9-4f9a-8eac-3957905c0002</td>\n",
       "      <td>New York Philharmonic</td>\n",
       "      <td>3853</td>\n",
       "      <td>1842-43</td>\n",
       "      <td>[{'workTitle': 'SYMPHONY NO. 5 IN C MINOR, OP.67', 'conductorName': 'Hill, Ureli Corelli', 'ID': '52446*', 'soloists': [], 'composerName': 'Beethoven,  Ludwig  van'}, {'workTitle': 'OBERON', 'composerName': 'Weber,  Carl  Maria Von', 'conductorName': 'Timm, Henry C.', 'ID': '8834*4', 'soloists': [{'soloistName': 'Otto, Antoinette', 'soloistRoles': 'S', 'soloistInstrument': 'Soprano'}], 'movement': '\"Ozean, du Ungeheuer\" (Ocean, thou mighty monster), Reiza (Scene and Aria), Act II'}, {'workTitle': 'QUINTET, PIANO, D MINOR, OP. 74', 'ID': '3642*', 'soloists': [{'soloistName': 'Scharfenberg, William', 'soloistRoles': 'A', 'soloistInstrument': 'Piano'}, {'soloistName': 'Hill, Ureli Corelli', 'soloistRoles': 'A', 'soloistInstrument': 'Violin'}, {'soloistName': 'Derwort, G. H.', 'soloistRoles': 'A', 'soloistInstrument': 'Viola'}, {'soloistName': 'Boucher, Alfred', 'soloistRoles': 'A', 'soloistInstrument': 'Cello'}, {'soloistName': 'Rosier, F. W.', 'soloistRoles': 'A', 'soloistInstrument': 'Contrabass'}], 'composerName': 'Hummel,  Johann'}, {'interval': 'Intermission', 'ID': '0*', 'soloists': []}, {'workTitle': 'OBERON', 'composerName': 'Weber,  Carl  Maria Von', 'conductorName': 'Etienne, Denis G.', 'ID': '8834*3', 'soloists': [], 'movement': 'Overture'}, {'workTitle': 'ARMIDA', 'composerName': 'Rossini,  Gioachino', 'conductorName': 'Timm, Henry C.', 'ID': '8835*1', 'soloists': [{'soloistName': 'Otto, Antoinette', 'soloistRoles': 'S', 'soloistInstrument': 'Soprano'}, {'soloistName': 'Horn, Charles Edward', 'soloistRoles': 'S', 'soloistInstrument': 'Tenor'}], 'movement': 'Duet'}, {'workTitle': 'FIDELIO, OP. 72', 'composerName': 'Beethoven,  Ludwig  van', 'conductorName': 'Timm, Henry C.', 'ID': '8837*6', 'soloists': [{'soloistName': 'Horn, Charles Edward', 'soloistRoles': 'S', 'soloistInstrument': 'Tenor'}], 'movement': '\"In Des Lebens Fruhlingstagen...O spur ich nicht linde,\" Florestan (aria)'}, {'workTitle': 'ABDUCTION FROM THE SERAGLIO,THE, K.384', 'composerName': 'Mozart,  Wolfgang  Amadeus', 'conductorName': 'Timm, Henry C.', 'ID': '8336*4', 'soloists': [{'soloistName': 'Otto, Antoinette', 'soloistRoles': 'S', 'soloistInstrument': 'Soprano'}], 'movement': '\"Ach Ich liebte,\" Konstanze (aria)'}, {'workTitle': 'OVERTURE NO. 1, D MINOR, OP. 38', 'conductorName': 'Timm, Henry C.', 'ID': '5543*', 'soloists': [], 'composerName': 'Kalliwoda,  Johann  W.'}]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>[{'Date': '1843-02-18T05:00:00Z', 'eventType': 'Subscription Season', 'Venue': 'Apollo Rooms', 'Location': 'Manhattan, NY', 'Time': '8:00PM'}]</td>\n",
       "      <td>c7b2b95c-5e0b-431c-a340-5b37fc860b34</td>\n",
       "      <td>New York Philharmonic</td>\n",
       "      <td>5178</td>\n",
       "      <td>1842-43</td>\n",
       "      <td>[{'workTitle': 'SYMPHONY NO. 3 IN E FLAT MAJOR, OP. 55 (EROICA)', 'conductorName': 'Hill, Ureli Corelli', 'ID': '52437*', 'soloists': [], 'composerName': 'Beethoven,  Ludwig  van'}, {'workTitle': 'I PURITANI', 'composerName': 'Bellini,  Vincenzo', 'conductorName': 'Hill, Ureli Corelli', 'ID': '8838*2', 'soloists': [{'soloistName': 'Otto, Antoinette', 'soloistRoles': 'S', 'soloistInstrument': 'Soprano'}], 'movement': 'Elvira (aria): \"Qui la voce...Vien, diletto\"'}, {'workTitle': 'CELEBRATED ELEGIE', 'conductorName': 'Hill, Ureli Corelli', 'ID': '3659*', 'soloists': [{'soloistName': 'Boucher, Alfred', 'soloistRoles': 'S', 'soloistInstrument': 'Cello'}], 'composerName': 'Romberg,  Bernhard'}, {'interval': 'Intermission', 'ID': '0*', 'soloists': []}, {'workTitle': 'WILLIAM TELL', 'composerName': 'Rossini,  Gioachino', 'conductorName': 'Alpers, William', 'ID': '8839*2', 'soloists': [], 'movement': 'Overture'}, {'workTitle': 'STABAT MATER', 'composerName': 'Rossini,  Gioachino', 'conductorName': 'Alpers, William', 'ID': '53076*2', 'soloists': [{'soloistName': 'Otto, Antoinette', 'soloistRoles': 'S', 'soloistInstrument': 'Soprano'}], 'movement': 'Inflammatus et Accensus (Aria with Chorus)'}, {'workTitle': 'CONCERTO, PIANO, A-FLAT MAJOR, OP. 113', 'composerName': 'Hummel,  Johann', 'conductorName': 'Alpers, William', 'ID': '51568*2', 'soloists': [{'soloistName': 'Timm, Henry C.', 'soloistRoles': 'S', 'soloistInstrument': 'Piano'}], 'movement': 'Romanza: Larghetto con moto'}, {'workTitle': 'CONCERTO, PIANO, A-FLAT MAJOR, OP. 113', 'composerName': 'Hummel,  Johann', 'conductorName': 'Alpers, William', 'ID': '51568*3', 'soloists': [{'soloistName': 'Timm, Henry C.', 'soloistRoles': 'S', 'soloistInstrument': 'Piano'}], 'movement': 'Rondo alla spagniola: Allegro moderato'}, {'workTitle': 'FREISCHUTZ, DER', 'composerName': 'Weber,  Carl  Maria Von', 'conductorName': 'Alpers, William', 'ID': '6709*16', 'soloists': [], 'movement': 'Overture'}]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>[{'Date': '1843-04-07T05:00:00Z', 'eventType': 'Special', 'Venue': 'Apollo Rooms', 'Location': 'Manhattan, NY', 'Time': '8:00PM'}]</td>\n",
       "      <td>894e1a52-1ae5-4fa7-aec0-b99997555a37</td>\n",
       "      <td>Musicians from the New York Philharmonic</td>\n",
       "      <td>10785</td>\n",
       "      <td>1842-43</td>\n",
       "      <td>[{'workTitle': 'EGMONT, OP.84', 'composerName': 'Beethoven,  Ludwig  van', 'conductorName': 'Hill, Ureli Corelli', 'ID': '52364*1', 'soloists': [], 'movement': 'Overture'}, {'workTitle': 'OBERON', 'composerName': 'Weber,  Carl  Maria Von', 'conductorName': 'Not conducted', 'ID': '8834*4', 'soloists': [{'soloistName': 'Otto, Antoinette', 'soloistRoles': 'S', 'soloistInstrument': 'Soprano'}, {'soloistName': 'Timm, Henry C.', 'soloistRoles': 'A', 'soloistInstrument': 'Piano'}], 'movement': '\"Ozean, du Ungeheuer\" (Ocean, thou mighty monster), Reiza (Scene and Aria), Act II'}, {'workTitle': 'CONCERTO, PIANO, A MINOR, OP. 85', 'conductorName': 'Hill, Ureli Corelli', 'ID': '4567*', 'soloists': [{'soloistName': 'Scharfenberg, William', 'soloistRoles': 'S', 'soloistInstrument': 'Piano'}], 'composerName': 'Hummel,  Johann'}, {'workTitle': 'O HAPPY HAPPY HOUR', 'conductorName': 'Not conducted', 'ID': '5150*', 'soloists': [{'soloistName': 'Otto, Antoinette', 'soloistRoles': 'S', 'soloistInstrument': 'Soprano'}, {'soloistName': 'Timm, Henry C.', 'soloistRoles': 'A', 'soloistInstrument': 'Piano'}], 'composerName': 'Pacini,  Giovanni'}, {'workTitle': 'FANTASIA ON SWEEDISH AIRS', 'conductorName': 'Not conducted', 'ID': '5161*', 'soloists': [{'soloistName': 'Boucher, Alfred', 'soloistRoles': 'S', 'soloistInstrument': 'Cello'}], 'composerName': 'Romberg,  Bernhard'}, {'workTitle': 'SEXTET IN E FLAT MAJOR, OP. 30', 'composerName': 'Onslow,  George', 'conductorName': 'Not conducted', 'ID': '5162*2', 'soloists': [{'soloistName': 'Scharfenberg, William', 'soloistRoles': 'S', 'soloistInstrument': 'Piano'}, {'soloistName': 'Lehman', 'soloistRoles': 'A', 'soloistInstrument': 'Flute'}, {'soloistName': 'Groneveldt, Theodore W.', 'soloistRoles': 'A', 'soloistInstrument': 'Clarinet'}, {'soloistName': 'Hegelund, H. W.', 'soloistRoles': 'A', 'soloistInstrument': 'Bassoon'}, {'soloistName': 'Woehning, F. C.', 'soloistRoles': 'A', 'soloistInstrument': 'French Horn'}, {'soloistName': 'Rosier, F. W.', 'soloistRoles': 'A', 'soloistInstrument': 'Contrabass'}], 'movement': 'Andante con variazioni'}, {'workTitle': 'SEXTET IN E FLAT MAJOR, OP. 30', 'composerName': 'Onslow,  George', 'conductorName': 'Not conducted', 'ID': '5162*3', 'soloists': [{'soloistName': '', 'soloistRoles': '', 'soloistInstrument': ''}, {'soloistName': 'Scharfenberg, William', 'soloistRoles': 'S', 'soloistInstrument': 'Piano'}, {'soloistName': 'Lehman', 'soloistRoles': 'A', 'soloistInstrument': 'Flute'}, {'soloistName': 'Groneveldt, Theodore W.', 'soloistRoles': 'A', 'soloistInstrument': 'Clarinet'}, {'soloistName': 'Hegelund, H. W.', 'soloistRoles': 'A', 'soloistInstrument': 'Bassoon'}, {'soloistName': 'Woehning, F. C.', 'soloistRoles': 'A', 'soloistInstrument': 'French Horn'}, {'soloistName': 'Rosier, F. W.', 'soloistRoles': 'A', 'soloistInstrument': 'Contrabass'}], 'movement': 'Minuetto'}, {'workTitle': 'WILLIAM TELL', 'composerName': 'Rossini,  Gioachino', 'conductorName': 'Alpers, William', 'ID': '8839*2', 'soloists': [], 'movement': 'Overture'}, {'workTitle': 'FANTASIA AND VARIATIONS ON THEMES FROM NORMA, OP. 12 (FOUR HANDS)', 'conductorName': 'Not conducted', 'ID': '5166*', 'soloists': [{'soloistName': '', 'soloistRoles': '', 'soloistInstrument': ''}, {'soloistName': 'Rakeman, Frederick', 'soloistRoles': 'S', 'soloistInstrument': 'Piano'}, {'soloistName': 'Scharfenberg, William', 'soloistRoles': 'S', 'soloistInstrument': 'Piano'}], 'composerName': 'Thalberg,  Sigismond'}, {'workTitle': 'MAGIC FLUTE, THE, K.620', 'composerName': 'Mozart,  Wolfgang  Amadeus', 'conductorName': 'Not conducted', 'ID': '8955*13', 'soloists': [{'soloistName': '', 'soloistRoles': '', 'soloistInstrument': ''}, {'soloistName': 'Otto, Antoinette', 'soloistRoles': 'S', 'soloistInstrument': 'Soprano'}, {'soloistName': 'Timm, Henry C.', 'soloistRoles': 'A', 'soloistInstrument': 'Piano'}], 'movement': 'Aria (unspecified)'}, {'workTitle': 'INTRODUCTION AND VARIATIONS ON THE ROMANCE OF JOSEPH, OP. 20', 'conductorName': 'Alpers, William', 'ID': '5172*', 'soloists': [{'soloistName': '', 'soloistRoles': '', 'soloistInstrument': ''}, {'soloistName': 'Scharfenberg, William', 'soloistRoles': 'S', 'soloistInstrument': 'Piano'}], 'composerName': 'Herz,  Henri'}, {'workTitle': 'QUINTET FOR WINDS AND ORCHESTRA', 'conductorName': 'Not conducted', 'ID': '5174*', 'soloists': [{'soloistName': 'Lehman', 'soloistRoles': 'A', 'soloistInstrument': 'Flute'}, {'soloistName': 'Wiese, Frederick', 'soloistRoles': 'A', 'soloistInstrument': 'Oboe'}, {'soloistName': 'Groneveldt, Theodore W.', 'soloistRoles': 'A', 'soloistInstrument': 'Clarinet'}, {'soloistName': 'Hegelund, H. W.', 'soloistRoles': 'A', 'soloistInstrument': 'Bassoon'}, {'soloistName': 'Woehning, F. C.', 'soloistRoles': 'A', 'soloistInstrument': 'French Horn'}], 'composerName': 'Lindpaintner,  Peter  Von'}]</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                                                                                                         concerts  \\\n",
       "0  [{'Date': '1842-12-07T05:00:00Z', 'eventType': 'Subscription Season', 'Venue': 'Apollo Rooms', 'Location': 'Manhattan, NY', 'Time': '8:00PM'}]   \n",
       "1  [{'Date': '1843-02-18T05:00:00Z', 'eventType': 'Subscription Season', 'Venue': 'Apollo Rooms', 'Location': 'Manhattan, NY', 'Time': '8:00PM'}]   \n",
       "2  [{'Date': '1843-04-07T05:00:00Z', 'eventType': 'Special', 'Venue': 'Apollo Rooms', 'Location': 'Manhattan, NY', 'Time': '8:00PM'}]               \n",
       "\n",
       "                                     id  \\\n",
       "0  38e072a7-8fc9-4f9a-8eac-3957905c0002   \n",
       "1  c7b2b95c-5e0b-431c-a340-5b37fc860b34   \n",
       "2  894e1a52-1ae5-4fa7-aec0-b99997555a37   \n",
       "\n",
       "                                  orchestra programID   season  \\\n",
       "0  New York Philharmonic                     3853      1842-43   \n",
       "1  New York Philharmonic                     5178      1842-43   \n",
       "2  Musicians from the New York Philharmonic  10785     1842-43   \n",
       "\n",
       "                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        works  \n",
       "0  [{'workTitle': 'SYMPHONY NO. 5 IN C MINOR, OP.67', 'conductorName': 'Hill, Ureli Corelli', 'ID': '52446*', 'soloists': [], 'composerName': 'Beethoven,  Ludwig  van'}, {'workTitle': 'OBERON', 'composerName': 'Weber,  Carl  Maria Von', 'conductorName': 'Timm, Henry C.', 'ID': '8834*4', 'soloists': [{'soloistName': 'Otto, Antoinette', 'soloistRoles': 'S', 'soloistInstrument': 'Soprano'}], 'movement': '\"Ozean, du Ungeheuer\" (Ocean, thou mighty monster), Reiza (Scene and Aria), Act II'}, {'workTitle': 'QUINTET, PIANO, D MINOR, OP. 74', 'ID': '3642*', 'soloists': [{'soloistName': 'Scharfenberg, William', 'soloistRoles': 'A', 'soloistInstrument': 'Piano'}, {'soloistName': 'Hill, Ureli Corelli', 'soloistRoles': 'A', 'soloistInstrument': 'Violin'}, {'soloistName': 'Derwort, G. H.', 'soloistRoles': 'A', 'soloistInstrument': 'Viola'}, {'soloistName': 'Boucher, Alfred', 'soloistRoles': 'A', 'soloistInstrument': 'Cello'}, {'soloistName': 'Rosier, F. W.', 'soloistRoles': 'A', 'soloistInstrument': 'Contrabass'}], 'composerName': 'Hummel,  Johann'}, {'interval': 'Intermission', 'ID': '0*', 'soloists': []}, {'workTitle': 'OBERON', 'composerName': 'Weber,  Carl  Maria Von', 'conductorName': 'Etienne, Denis G.', 'ID': '8834*3', 'soloists': [], 'movement': 'Overture'}, {'workTitle': 'ARMIDA', 'composerName': 'Rossini,  Gioachino', 'conductorName': 'Timm, Henry C.', 'ID': '8835*1', 'soloists': [{'soloistName': 'Otto, Antoinette', 'soloistRoles': 'S', 'soloistInstrument': 'Soprano'}, {'soloistName': 'Horn, Charles Edward', 'soloistRoles': 'S', 'soloistInstrument': 'Tenor'}], 'movement': 'Duet'}, {'workTitle': 'FIDELIO, OP. 72', 'composerName': 'Beethoven,  Ludwig  van', 'conductorName': 'Timm, Henry C.', 'ID': '8837*6', 'soloists': [{'soloistName': 'Horn, Charles Edward', 'soloistRoles': 'S', 'soloistInstrument': 'Tenor'}], 'movement': '\"In Des Lebens Fruhlingstagen...O spur ich nicht linde,\" Florestan (aria)'}, {'workTitle': 'ABDUCTION FROM THE SERAGLIO,THE, K.384', 'composerName': 'Mozart,  Wolfgang  Amadeus', 'conductorName': 'Timm, Henry C.', 'ID': '8336*4', 'soloists': [{'soloistName': 'Otto, Antoinette', 'soloistRoles': 'S', 'soloistInstrument': 'Soprano'}], 'movement': '\"Ach Ich liebte,\" Konstanze (aria)'}, {'workTitle': 'OVERTURE NO. 1, D MINOR, OP. 38', 'conductorName': 'Timm, Henry C.', 'ID': '5543*', 'soloists': [], 'composerName': 'Kalliwoda,  Johann  W.'}]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       \n",
       "1  [{'workTitle': 'SYMPHONY NO. 3 IN E FLAT MAJOR, OP. 55 (EROICA)', 'conductorName': 'Hill, Ureli Corelli', 'ID': '52437*', 'soloists': [], 'composerName': 'Beethoven,  Ludwig  van'}, {'workTitle': 'I PURITANI', 'composerName': 'Bellini,  Vincenzo', 'conductorName': 'Hill, Ureli Corelli', 'ID': '8838*2', 'soloists': [{'soloistName': 'Otto, Antoinette', 'soloistRoles': 'S', 'soloistInstrument': 'Soprano'}], 'movement': 'Elvira (aria): \"Qui la voce...Vien, diletto\"'}, {'workTitle': 'CELEBRATED ELEGIE', 'conductorName': 'Hill, Ureli Corelli', 'ID': '3659*', 'soloists': [{'soloistName': 'Boucher, Alfred', 'soloistRoles': 'S', 'soloistInstrument': 'Cello'}], 'composerName': 'Romberg,  Bernhard'}, {'interval': 'Intermission', 'ID': '0*', 'soloists': []}, {'workTitle': 'WILLIAM TELL', 'composerName': 'Rossini,  Gioachino', 'conductorName': 'Alpers, William', 'ID': '8839*2', 'soloists': [], 'movement': 'Overture'}, {'workTitle': 'STABAT MATER', 'composerName': 'Rossini,  Gioachino', 'conductorName': 'Alpers, William', 'ID': '53076*2', 'soloists': [{'soloistName': 'Otto, Antoinette', 'soloistRoles': 'S', 'soloistInstrument': 'Soprano'}], 'movement': 'Inflammatus et Accensus (Aria with Chorus)'}, {'workTitle': 'CONCERTO, PIANO, A-FLAT MAJOR, OP. 113', 'composerName': 'Hummel,  Johann', 'conductorName': 'Alpers, William', 'ID': '51568*2', 'soloists': [{'soloistName': 'Timm, Henry C.', 'soloistRoles': 'S', 'soloistInstrument': 'Piano'}], 'movement': 'Romanza: Larghetto con moto'}, {'workTitle': 'CONCERTO, PIANO, A-FLAT MAJOR, OP. 113', 'composerName': 'Hummel,  Johann', 'conductorName': 'Alpers, William', 'ID': '51568*3', 'soloists': [{'soloistName': 'Timm, Henry C.', 'soloistRoles': 'S', 'soloistInstrument': 'Piano'}], 'movement': 'Rondo alla spagniola: Allegro moderato'}, {'workTitle': 'FREISCHUTZ, DER', 'composerName': 'Weber,  Carl  Maria Von', 'conductorName': 'Alpers, William', 'ID': '6709*16', 'soloists': [], 'movement': 'Overture'}]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    \n",
       "2  [{'workTitle': 'EGMONT, OP.84', 'composerName': 'Beethoven,  Ludwig  van', 'conductorName': 'Hill, Ureli Corelli', 'ID': '52364*1', 'soloists': [], 'movement': 'Overture'}, {'workTitle': 'OBERON', 'composerName': 'Weber,  Carl  Maria Von', 'conductorName': 'Not conducted', 'ID': '8834*4', 'soloists': [{'soloistName': 'Otto, Antoinette', 'soloistRoles': 'S', 'soloistInstrument': 'Soprano'}, {'soloistName': 'Timm, Henry C.', 'soloistRoles': 'A', 'soloistInstrument': 'Piano'}], 'movement': '\"Ozean, du Ungeheuer\" (Ocean, thou mighty monster), Reiza (Scene and Aria), Act II'}, {'workTitle': 'CONCERTO, PIANO, A MINOR, OP. 85', 'conductorName': 'Hill, Ureli Corelli', 'ID': '4567*', 'soloists': [{'soloistName': 'Scharfenberg, William', 'soloistRoles': 'S', 'soloistInstrument': 'Piano'}], 'composerName': 'Hummel,  Johann'}, {'workTitle': 'O HAPPY HAPPY HOUR', 'conductorName': 'Not conducted', 'ID': '5150*', 'soloists': [{'soloistName': 'Otto, Antoinette', 'soloistRoles': 'S', 'soloistInstrument': 'Soprano'}, {'soloistName': 'Timm, Henry C.', 'soloistRoles': 'A', 'soloistInstrument': 'Piano'}], 'composerName': 'Pacini,  Giovanni'}, {'workTitle': 'FANTASIA ON SWEEDISH AIRS', 'conductorName': 'Not conducted', 'ID': '5161*', 'soloists': [{'soloistName': 'Boucher, Alfred', 'soloistRoles': 'S', 'soloistInstrument': 'Cello'}], 'composerName': 'Romberg,  Bernhard'}, {'workTitle': 'SEXTET IN E FLAT MAJOR, OP. 30', 'composerName': 'Onslow,  George', 'conductorName': 'Not conducted', 'ID': '5162*2', 'soloists': [{'soloistName': 'Scharfenberg, William', 'soloistRoles': 'S', 'soloistInstrument': 'Piano'}, {'soloistName': 'Lehman', 'soloistRoles': 'A', 'soloistInstrument': 'Flute'}, {'soloistName': 'Groneveldt, Theodore W.', 'soloistRoles': 'A', 'soloistInstrument': 'Clarinet'}, {'soloistName': 'Hegelund, H. W.', 'soloistRoles': 'A', 'soloistInstrument': 'Bassoon'}, {'soloistName': 'Woehning, F. C.', 'soloistRoles': 'A', 'soloistInstrument': 'French Horn'}, {'soloistName': 'Rosier, F. W.', 'soloistRoles': 'A', 'soloistInstrument': 'Contrabass'}], 'movement': 'Andante con variazioni'}, {'workTitle': 'SEXTET IN E FLAT MAJOR, OP. 30', 'composerName': 'Onslow,  George', 'conductorName': 'Not conducted', 'ID': '5162*3', 'soloists': [{'soloistName': '', 'soloistRoles': '', 'soloistInstrument': ''}, {'soloistName': 'Scharfenberg, William', 'soloistRoles': 'S', 'soloistInstrument': 'Piano'}, {'soloistName': 'Lehman', 'soloistRoles': 'A', 'soloistInstrument': 'Flute'}, {'soloistName': 'Groneveldt, Theodore W.', 'soloistRoles': 'A', 'soloistInstrument': 'Clarinet'}, {'soloistName': 'Hegelund, H. W.', 'soloistRoles': 'A', 'soloistInstrument': 'Bassoon'}, {'soloistName': 'Woehning, F. C.', 'soloistRoles': 'A', 'soloistInstrument': 'French Horn'}, {'soloistName': 'Rosier, F. W.', 'soloistRoles': 'A', 'soloistInstrument': 'Contrabass'}], 'movement': 'Minuetto'}, {'workTitle': 'WILLIAM TELL', 'composerName': 'Rossini,  Gioachino', 'conductorName': 'Alpers, William', 'ID': '8839*2', 'soloists': [], 'movement': 'Overture'}, {'workTitle': 'FANTASIA AND VARIATIONS ON THEMES FROM NORMA, OP. 12 (FOUR HANDS)', 'conductorName': 'Not conducted', 'ID': '5166*', 'soloists': [{'soloistName': '', 'soloistRoles': '', 'soloistInstrument': ''}, {'soloistName': 'Rakeman, Frederick', 'soloistRoles': 'S', 'soloistInstrument': 'Piano'}, {'soloistName': 'Scharfenberg, William', 'soloistRoles': 'S', 'soloistInstrument': 'Piano'}], 'composerName': 'Thalberg,  Sigismond'}, {'workTitle': 'MAGIC FLUTE, THE, K.620', 'composerName': 'Mozart,  Wolfgang  Amadeus', 'conductorName': 'Not conducted', 'ID': '8955*13', 'soloists': [{'soloistName': '', 'soloistRoles': '', 'soloistInstrument': ''}, {'soloistName': 'Otto, Antoinette', 'soloistRoles': 'S', 'soloistInstrument': 'Soprano'}, {'soloistName': 'Timm, Henry C.', 'soloistRoles': 'A', 'soloistInstrument': 'Piano'}], 'movement': 'Aria (unspecified)'}, {'workTitle': 'INTRODUCTION AND VARIATIONS ON THE ROMANCE OF JOSEPH, OP. 20', 'conductorName': 'Alpers, William', 'ID': '5172*', 'soloists': [{'soloistName': '', 'soloistRoles': '', 'soloistInstrument': ''}, {'soloistName': 'Scharfenberg, William', 'soloistRoles': 'S', 'soloistInstrument': 'Piano'}], 'composerName': 'Herz,  Henri'}, {'workTitle': 'QUINTET FOR WINDS AND ORCHESTRA', 'conductorName': 'Not conducted', 'ID': '5174*', 'soloists': [{'soloistName': 'Lehman', 'soloistRoles': 'A', 'soloistInstrument': 'Flute'}, {'soloistName': 'Wiese, Frederick', 'soloistRoles': 'A', 'soloistInstrument': 'Oboe'}, {'soloistName': 'Groneveldt, Theodore W.', 'soloistRoles': 'A', 'soloistInstrument': 'Clarinet'}, {'soloistName': 'Hegelund, H. W.', 'soloistRoles': 'A', 'soloistInstrument': 'Bassoon'}, {'soloistName': 'Woehning, F. C.', 'soloistRoles': 'A', 'soloistInstrument': 'French Horn'}], 'composerName': 'Lindpaintner,  Peter  Von'}]  "
      ]
     },
     "execution_count": 70,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "nycphil = pd.io.json.json_normalize(d['programs'])\n",
    "nycphil.head(3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 74,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ID</th>\n",
       "      <th>composerName</th>\n",
       "      <th>conductorName</th>\n",
       "      <th>interval</th>\n",
       "      <th>movement</th>\n",
       "      <th>soloists</th>\n",
       "      <th>workTitle</th>\n",
       "      <th>id</th>\n",
       "      <th>orchestra</th>\n",
       "      <th>programID</th>\n",
       "      <th>season</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>52446*</td>\n",
       "      <td>Beethoven,  Ludwig  van</td>\n",
       "      <td>Hill, Ureli Corelli</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>[]</td>\n",
       "      <td>SYMPHONY NO. 5 IN C MINOR, OP.67</td>\n",
       "      <td>38e072a7-8fc9-4f9a-8eac-3957905c0002</td>\n",
       "      <td>New York Philharmonic</td>\n",
       "      <td>3853</td>\n",
       "      <td>1842-43</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>8834*4</td>\n",
       "      <td>Weber,  Carl  Maria Von</td>\n",
       "      <td>Timm, Henry C.</td>\n",
       "      <td>NaN</td>\n",
       "      <td>\"Ozean, du Ungeheuer\" (Ocean, thou mighty monster), Reiza (Scene and Aria), Act II</td>\n",
       "      <td>[{'soloistName': 'Otto, Antoinette', 'soloistRoles': 'S', 'soloistInstrument': 'Soprano'}]</td>\n",
       "      <td>OBERON</td>\n",
       "      <td>38e072a7-8fc9-4f9a-8eac-3957905c0002</td>\n",
       "      <td>New York Philharmonic</td>\n",
       "      <td>3853</td>\n",
       "      <td>1842-43</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3642*</td>\n",
       "      <td>Hummel,  Johann</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>[{'soloistName': 'Scharfenberg, William', 'soloistRoles': 'A', 'soloistInstrument': 'Piano'}, {'soloistName': 'Hill, Ureli Corelli', 'soloistRoles': 'A', 'soloistInstrument': 'Violin'}, {'soloistName': 'Derwort, G. H.', 'soloistRoles': 'A', 'soloistInstrument': 'Viola'}, {'soloistName': 'Boucher, Alfred', 'soloistRoles': 'A', 'soloistInstrument': 'Cello'}, {'soloistName': 'Rosier, F. W.', 'soloistRoles': 'A', 'soloistInstrument': 'Contrabass'}]</td>\n",
       "      <td>QUINTET, PIANO, D MINOR, OP. 74</td>\n",
       "      <td>38e072a7-8fc9-4f9a-8eac-3957905c0002</td>\n",
       "      <td>New York Philharmonic</td>\n",
       "      <td>3853</td>\n",
       "      <td>1842-43</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       ID             composerName        conductorName interval  \\\n",
       "0  52446*  Beethoven,  Ludwig  van  Hill, Ureli Corelli  NaN       \n",
       "1  8834*4  Weber,  Carl  Maria Von  Timm, Henry C.       NaN       \n",
       "2  3642*   Hummel,  Johann          NaN                  NaN       \n",
       "\n",
       "                                                                             movement  \\\n",
       "0  NaN                                                                                  \n",
       "1  \"Ozean, du Ungeheuer\" (Ocean, thou mighty monster), Reiza (Scene and Aria), Act II   \n",
       "2  NaN                                                                                  \n",
       "\n",
       "                                                                                                                                                                                                                                                                                                                                                                                                                                                           soloists  \\\n",
       "0  []                                                                                                                                                                                                                                                                                                                                                                                                                                                                 \n",
       "1  [{'soloistName': 'Otto, Antoinette', 'soloistRoles': 'S', 'soloistInstrument': 'Soprano'}]                                                                                                                                                                                                                                                                                                                                                                         \n",
       "2  [{'soloistName': 'Scharfenberg, William', 'soloistRoles': 'A', 'soloistInstrument': 'Piano'}, {'soloistName': 'Hill, Ureli Corelli', 'soloistRoles': 'A', 'soloistInstrument': 'Violin'}, {'soloistName': 'Derwort, G. H.', 'soloistRoles': 'A', 'soloistInstrument': 'Viola'}, {'soloistName': 'Boucher, Alfred', 'soloistRoles': 'A', 'soloistInstrument': 'Cello'}, {'soloistName': 'Rosier, F. W.', 'soloistRoles': 'A', 'soloistInstrument': 'Contrabass'}]   \n",
       "\n",
       "                          workTitle                                    id  \\\n",
       "0  SYMPHONY NO. 5 IN C MINOR, OP.67  38e072a7-8fc9-4f9a-8eac-3957905c0002   \n",
       "1  OBERON                            38e072a7-8fc9-4f9a-8eac-3957905c0002   \n",
       "2  QUINTET, PIANO, D MINOR, OP. 74   38e072a7-8fc9-4f9a-8eac-3957905c0002   \n",
       "\n",
       "               orchestra programID   season  \n",
       "0  New York Philharmonic  3853      1842-43  \n",
       "1  New York Philharmonic  3853      1842-43  \n",
       "2  New York Philharmonic  3853      1842-43  "
      ]
     },
     "execution_count": 74,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "works_data = pd.io.json.json_normalize(data=d['programs'], record_path='works', \n",
    "                            meta=['id', 'orchestra','programID', 'season'])\n",
    "works_data.head(3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 75,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Date</th>\n",
       "      <th>Location</th>\n",
       "      <th>Time</th>\n",
       "      <th>Venue</th>\n",
       "      <th>eventType</th>\n",
       "      <th>id</th>\n",
       "      <th>orchestra</th>\n",
       "      <th>programID</th>\n",
       "      <th>season</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1842-12-07T05:00:00Z</td>\n",
       "      <td>Manhattan, NY</td>\n",
       "      <td>8:00PM</td>\n",
       "      <td>Apollo Rooms</td>\n",
       "      <td>Subscription Season</td>\n",
       "      <td>38e072a7-8fc9-4f9a-8eac-3957905c0002</td>\n",
       "      <td>New York Philharmonic</td>\n",
       "      <td>3853</td>\n",
       "      <td>1842-43</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1843-02-18T05:00:00Z</td>\n",
       "      <td>Manhattan, NY</td>\n",
       "      <td>8:00PM</td>\n",
       "      <td>Apollo Rooms</td>\n",
       "      <td>Subscription Season</td>\n",
       "      <td>c7b2b95c-5e0b-431c-a340-5b37fc860b34</td>\n",
       "      <td>New York Philharmonic</td>\n",
       "      <td>5178</td>\n",
       "      <td>1842-43</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1843-04-07T05:00:00Z</td>\n",
       "      <td>Manhattan, NY</td>\n",
       "      <td>8:00PM</td>\n",
       "      <td>Apollo Rooms</td>\n",
       "      <td>Special</td>\n",
       "      <td>894e1a52-1ae5-4fa7-aec0-b99997555a37</td>\n",
       "      <td>Musicians from the New York Philharmonic</td>\n",
       "      <td>10785</td>\n",
       "      <td>1842-43</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                   Date       Location    Time         Venue  \\\n",
       "0  1842-12-07T05:00:00Z  Manhattan, NY  8:00PM  Apollo Rooms   \n",
       "1  1843-02-18T05:00:00Z  Manhattan, NY  8:00PM  Apollo Rooms   \n",
       "2  1843-04-07T05:00:00Z  Manhattan, NY  8:00PM  Apollo Rooms   \n",
       "\n",
       "             eventType                                    id  \\\n",
       "0  Subscription Season  38e072a7-8fc9-4f9a-8eac-3957905c0002   \n",
       "1  Subscription Season  c7b2b95c-5e0b-431c-a340-5b37fc860b34   \n",
       "2  Special              894e1a52-1ae5-4fa7-aec0-b99997555a37   \n",
       "\n",
       "                                  orchestra programID   season  \n",
       "0  New York Philharmonic                     3853      1842-43  \n",
       "1  New York Philharmonic                     5178      1842-43  \n",
       "2  Musicians from the New York Philharmonic  10785     1842-43  "
      ]
     },
     "execution_count": 75,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "works_data = pd.io.json.json_normalize(data=d['programs'], record_path='concerts', \n",
    "                            meta=['id', 'orchestra','programID', 'season'])\n",
    "works_data.head(3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 77,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>soloistInstrument</th>\n",
       "      <th>soloistName</th>\n",
       "      <th>soloistRoles</th>\n",
       "      <th>id</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Soprano</td>\n",
       "      <td>Otto, Antoinette</td>\n",
       "      <td>S</td>\n",
       "      <td>38e072a7-8fc9-4f9a-8eac-3957905c0002</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Piano</td>\n",
       "      <td>Scharfenberg, William</td>\n",
       "      <td>A</td>\n",
       "      <td>38e072a7-8fc9-4f9a-8eac-3957905c0002</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Violin</td>\n",
       "      <td>Hill, Ureli Corelli</td>\n",
       "      <td>A</td>\n",
       "      <td>38e072a7-8fc9-4f9a-8eac-3957905c0002</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  soloistInstrument            soloistName soloistRoles  \\\n",
       "0  Soprano           Otto, Antoinette       S             \n",
       "1  Piano             Scharfenberg, William  A             \n",
       "2  Violin            Hill, Ureli Corelli    A             \n",
       "\n",
       "                                     id  \n",
       "0  38e072a7-8fc9-4f9a-8eac-3957905c0002  \n",
       "1  38e072a7-8fc9-4f9a-8eac-3957905c0002  \n",
       "2  38e072a7-8fc9-4f9a-8eac-3957905c0002  "
      ]
     },
     "execution_count": 77,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "soloist_data = pd.io.json.json_normalize(data=d['programs'], record_path=['works', 'soloists'], \n",
    "                              meta=['id'])\n",
    "soloist_data.head(3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4. Loading Excel"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "sales_data = pd.read_excel('../Data/sales-funnel.xlsx')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Account</th>\n",
       "      <th>Name</th>\n",
       "      <th>Rep</th>\n",
       "      <th>Manager</th>\n",
       "      <th>Product</th>\n",
       "      <th>Quantity</th>\n",
       "      <th>Price</th>\n",
       "      <th>Status</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>714466</td>\n",
       "      <td>Trantow-Barrows</td>\n",
       "      <td>Craig Booker</td>\n",
       "      <td>Debra Henley</td>\n",
       "      <td>CPU</td>\n",
       "      <td>1</td>\n",
       "      <td>30000</td>\n",
       "      <td>presented</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>714466</td>\n",
       "      <td>Trantow-Barrows</td>\n",
       "      <td>Craig Booker</td>\n",
       "      <td>Debra Henley</td>\n",
       "      <td>Software</td>\n",
       "      <td>1</td>\n",
       "      <td>10000</td>\n",
       "      <td>presented</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>714466</td>\n",
       "      <td>Trantow-Barrows</td>\n",
       "      <td>Craig Booker</td>\n",
       "      <td>Debra Henley</td>\n",
       "      <td>Maintenance</td>\n",
       "      <td>2</td>\n",
       "      <td>5000</td>\n",
       "      <td>pending</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>737550</td>\n",
       "      <td>Fritsch, Russel and Anderson</td>\n",
       "      <td>Craig Booker</td>\n",
       "      <td>Debra Henley</td>\n",
       "      <td>CPU</td>\n",
       "      <td>1</td>\n",
       "      <td>35000</td>\n",
       "      <td>declined</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>146832</td>\n",
       "      <td>Kiehn-Spinka</td>\n",
       "      <td>Daniel Hilton</td>\n",
       "      <td>Debra Henley</td>\n",
       "      <td>CPU</td>\n",
       "      <td>2</td>\n",
       "      <td>65000</td>\n",
       "      <td>won</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Account                          Name            Rep       Manager  \\\n",
       "0   714466               Trantow-Barrows   Craig Booker  Debra Henley   \n",
       "1   714466               Trantow-Barrows   Craig Booker  Debra Henley   \n",
       "2   714466               Trantow-Barrows   Craig Booker  Debra Henley   \n",
       "3   737550  Fritsch, Russel and Anderson   Craig Booker  Debra Henley   \n",
       "4   146832                  Kiehn-Spinka  Daniel Hilton  Debra Henley   \n",
       "\n",
       "       Product  Quantity  Price     Status  \n",
       "0          CPU         1  30000  presented  \n",
       "1     Software         1  10000  presented  \n",
       "2  Maintenance         2   5000    pending  \n",
       "3          CPU         1  35000   declined  \n",
       "4          CPU         2  65000        won  "
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sales_data.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "sales_orders = pd.read_excel('../Data/SampleData.xlsx',sheet_name='SalesOrders')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 43 entries, 0 to 42\n",
      "Data columns (total 7 columns):\n",
      "OrderDate    43 non-null datetime64[ns]\n",
      "Region       43 non-null object\n",
      "Rep          43 non-null object\n",
      "Item         43 non-null object\n",
      "Units        43 non-null int64\n",
      "Unit Cost    43 non-null float64\n",
      "Total        43 non-null float64\n",
      "dtypes: datetime64[ns](1), float64(2), int64(1), object(3)\n",
      "memory usage: 2.4+ KB\n"
     ]
    }
   ],
   "source": [
    "sales_orders.info()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 5. Creating & Loading Pickled Data\n",
    "* Python pickle module is used for serializing and de-serializing a Python object structure. Any object in Python can be pickled so that it can be saved on disk. \n",
    "* What pickle does is that it “serializes” the object first before writing it to file. Pickling is a way to convert a python object (list, dict, etc.) into a character stream. \n",
    "* The idea is that this character stream contains all the information necessary to reconstruct the object in another python script."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>OrderDate</th>\n",
       "      <th>Region</th>\n",
       "      <th>Rep</th>\n",
       "      <th>Item</th>\n",
       "      <th>Units</th>\n",
       "      <th>Unit Cost</th>\n",
       "      <th>Total</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2018-01-06</td>\n",
       "      <td>East</td>\n",
       "      <td>Jones</td>\n",
       "      <td>Pencil</td>\n",
       "      <td>95</td>\n",
       "      <td>1.99</td>\n",
       "      <td>189.05</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2018-01-23</td>\n",
       "      <td>Central</td>\n",
       "      <td>Kivell</td>\n",
       "      <td>Binder</td>\n",
       "      <td>50</td>\n",
       "      <td>19.99</td>\n",
       "      <td>999.50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2018-02-09</td>\n",
       "      <td>Central</td>\n",
       "      <td>Jardine</td>\n",
       "      <td>Pencil</td>\n",
       "      <td>36</td>\n",
       "      <td>4.99</td>\n",
       "      <td>179.64</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2018-02-26</td>\n",
       "      <td>Central</td>\n",
       "      <td>Gill</td>\n",
       "      <td>Pen</td>\n",
       "      <td>27</td>\n",
       "      <td>19.99</td>\n",
       "      <td>539.73</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2018-03-15</td>\n",
       "      <td>West</td>\n",
       "      <td>Sorvino</td>\n",
       "      <td>Pencil</td>\n",
       "      <td>56</td>\n",
       "      <td>2.99</td>\n",
       "      <td>167.44</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   OrderDate   Region      Rep    Item  Units  Unit Cost   Total\n",
       "0 2018-01-06     East    Jones  Pencil     95       1.99  189.05\n",
       "1 2018-01-23  Central   Kivell  Binder     50      19.99  999.50\n",
       "2 2018-02-09  Central  Jardine  Pencil     36       4.99  179.64\n",
       "3 2018-02-26  Central     Gill     Pen     27      19.99  539.73\n",
       "4 2018-03-15     West  Sorvino  Pencil     56       2.99  167.44"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sales_orders.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "sales_orders.to_pickle('sales.pkl')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>OrderDate</th>\n",
       "      <th>Region</th>\n",
       "      <th>Rep</th>\n",
       "      <th>Item</th>\n",
       "      <th>Units</th>\n",
       "      <th>Unit Cost</th>\n",
       "      <th>Total</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2018-01-06</td>\n",
       "      <td>East</td>\n",
       "      <td>Jones</td>\n",
       "      <td>Pencil</td>\n",
       "      <td>95</td>\n",
       "      <td>1.99</td>\n",
       "      <td>189.05</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2018-01-23</td>\n",
       "      <td>Central</td>\n",
       "      <td>Kivell</td>\n",
       "      <td>Binder</td>\n",
       "      <td>50</td>\n",
       "      <td>19.99</td>\n",
       "      <td>999.50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2018-02-09</td>\n",
       "      <td>Central</td>\n",
       "      <td>Jardine</td>\n",
       "      <td>Pencil</td>\n",
       "      <td>36</td>\n",
       "      <td>4.99</td>\n",
       "      <td>179.64</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2018-02-26</td>\n",
       "      <td>Central</td>\n",
       "      <td>Gill</td>\n",
       "      <td>Pen</td>\n",
       "      <td>27</td>\n",
       "      <td>19.99</td>\n",
       "      <td>539.73</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2018-03-15</td>\n",
       "      <td>West</td>\n",
       "      <td>Sorvino</td>\n",
       "      <td>Pencil</td>\n",
       "      <td>56</td>\n",
       "      <td>2.99</td>\n",
       "      <td>167.44</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>2018-04-01</td>\n",
       "      <td>East</td>\n",
       "      <td>Jones</td>\n",
       "      <td>Binder</td>\n",
       "      <td>60</td>\n",
       "      <td>4.99</td>\n",
       "      <td>299.40</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>2018-04-18</td>\n",
       "      <td>Central</td>\n",
       "      <td>Andrews</td>\n",
       "      <td>Pencil</td>\n",
       "      <td>75</td>\n",
       "      <td>1.99</td>\n",
       "      <td>149.25</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>2018-05-05</td>\n",
       "      <td>Central</td>\n",
       "      <td>Jardine</td>\n",
       "      <td>Pencil</td>\n",
       "      <td>90</td>\n",
       "      <td>4.99</td>\n",
       "      <td>449.10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>2018-05-22</td>\n",
       "      <td>West</td>\n",
       "      <td>Thompson</td>\n",
       "      <td>Pencil</td>\n",
       "      <td>32</td>\n",
       "      <td>1.99</td>\n",
       "      <td>63.68</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>2018-06-08</td>\n",
       "      <td>East</td>\n",
       "      <td>Jones</td>\n",
       "      <td>Binder</td>\n",
       "      <td>60</td>\n",
       "      <td>8.99</td>\n",
       "      <td>539.40</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>2018-06-25</td>\n",
       "      <td>Central</td>\n",
       "      <td>Morgan</td>\n",
       "      <td>Pencil</td>\n",
       "      <td>90</td>\n",
       "      <td>4.99</td>\n",
       "      <td>449.10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>2018-07-12</td>\n",
       "      <td>East</td>\n",
       "      <td>Howard</td>\n",
       "      <td>Binder</td>\n",
       "      <td>29</td>\n",
       "      <td>1.99</td>\n",
       "      <td>57.71</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>2018-07-29</td>\n",
       "      <td>East</td>\n",
       "      <td>Parent</td>\n",
       "      <td>Binder</td>\n",
       "      <td>81</td>\n",
       "      <td>19.99</td>\n",
       "      <td>1619.19</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>2018-08-15</td>\n",
       "      <td>East</td>\n",
       "      <td>Jones</td>\n",
       "      <td>Pencil</td>\n",
       "      <td>35</td>\n",
       "      <td>4.99</td>\n",
       "      <td>174.65</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>2018-09-01</td>\n",
       "      <td>Central</td>\n",
       "      <td>Smith</td>\n",
       "      <td>Desk</td>\n",
       "      <td>2</td>\n",
       "      <td>125.00</td>\n",
       "      <td>250.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>2018-09-18</td>\n",
       "      <td>East</td>\n",
       "      <td>Jones</td>\n",
       "      <td>Pen Set</td>\n",
       "      <td>16</td>\n",
       "      <td>15.99</td>\n",
       "      <td>255.84</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>2018-10-05</td>\n",
       "      <td>Central</td>\n",
       "      <td>Morgan</td>\n",
       "      <td>Binder</td>\n",
       "      <td>28</td>\n",
       "      <td>8.99</td>\n",
       "      <td>251.72</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>2018-10-22</td>\n",
       "      <td>East</td>\n",
       "      <td>Jones</td>\n",
       "      <td>Pen</td>\n",
       "      <td>64</td>\n",
       "      <td>8.99</td>\n",
       "      <td>575.36</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>2018-11-08</td>\n",
       "      <td>East</td>\n",
       "      <td>Parent</td>\n",
       "      <td>Pen</td>\n",
       "      <td>15</td>\n",
       "      <td>19.99</td>\n",
       "      <td>299.85</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>2018-11-25</td>\n",
       "      <td>Central</td>\n",
       "      <td>Kivell</td>\n",
       "      <td>Pen Set</td>\n",
       "      <td>96</td>\n",
       "      <td>4.99</td>\n",
       "      <td>479.04</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>2018-12-12</td>\n",
       "      <td>Central</td>\n",
       "      <td>Smith</td>\n",
       "      <td>Pencil</td>\n",
       "      <td>67</td>\n",
       "      <td>1.29</td>\n",
       "      <td>86.43</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>2018-12-29</td>\n",
       "      <td>East</td>\n",
       "      <td>Parent</td>\n",
       "      <td>Pen Set</td>\n",
       "      <td>74</td>\n",
       "      <td>15.99</td>\n",
       "      <td>1183.26</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>2019-01-15</td>\n",
       "      <td>Central</td>\n",
       "      <td>Gill</td>\n",
       "      <td>Binder</td>\n",
       "      <td>46</td>\n",
       "      <td>8.99</td>\n",
       "      <td>413.54</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>2019-02-01</td>\n",
       "      <td>Central</td>\n",
       "      <td>Smith</td>\n",
       "      <td>Binder</td>\n",
       "      <td>87</td>\n",
       "      <td>15.00</td>\n",
       "      <td>1305.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>2019-02-18</td>\n",
       "      <td>East</td>\n",
       "      <td>Jones</td>\n",
       "      <td>Binder</td>\n",
       "      <td>4</td>\n",
       "      <td>4.99</td>\n",
       "      <td>19.96</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>2019-03-07</td>\n",
       "      <td>West</td>\n",
       "      <td>Sorvino</td>\n",
       "      <td>Binder</td>\n",
       "      <td>7</td>\n",
       "      <td>19.99</td>\n",
       "      <td>139.93</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>2019-03-24</td>\n",
       "      <td>Central</td>\n",
       "      <td>Jardine</td>\n",
       "      <td>Pen Set</td>\n",
       "      <td>50</td>\n",
       "      <td>4.99</td>\n",
       "      <td>249.50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>2019-04-10</td>\n",
       "      <td>Central</td>\n",
       "      <td>Andrews</td>\n",
       "      <td>Pencil</td>\n",
       "      <td>66</td>\n",
       "      <td>1.99</td>\n",
       "      <td>131.34</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>2019-04-27</td>\n",
       "      <td>East</td>\n",
       "      <td>Howard</td>\n",
       "      <td>Pen</td>\n",
       "      <td>96</td>\n",
       "      <td>4.99</td>\n",
       "      <td>479.04</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>2019-05-14</td>\n",
       "      <td>Central</td>\n",
       "      <td>Gill</td>\n",
       "      <td>Pencil</td>\n",
       "      <td>53</td>\n",
       "      <td>1.29</td>\n",
       "      <td>68.37</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>2019-05-31</td>\n",
       "      <td>Central</td>\n",
       "      <td>Gill</td>\n",
       "      <td>Binder</td>\n",
       "      <td>80</td>\n",
       "      <td>8.99</td>\n",
       "      <td>719.20</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31</th>\n",
       "      <td>2019-06-17</td>\n",
       "      <td>Central</td>\n",
       "      <td>Kivell</td>\n",
       "      <td>Desk</td>\n",
       "      <td>5</td>\n",
       "      <td>125.00</td>\n",
       "      <td>625.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>2019-07-04</td>\n",
       "      <td>East</td>\n",
       "      <td>Jones</td>\n",
       "      <td>Pen Set</td>\n",
       "      <td>62</td>\n",
       "      <td>4.99</td>\n",
       "      <td>309.38</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>33</th>\n",
       "      <td>2019-07-21</td>\n",
       "      <td>Central</td>\n",
       "      <td>Morgan</td>\n",
       "      <td>Pen Set</td>\n",
       "      <td>55</td>\n",
       "      <td>12.49</td>\n",
       "      <td>686.95</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>34</th>\n",
       "      <td>2019-08-07</td>\n",
       "      <td>Central</td>\n",
       "      <td>Kivell</td>\n",
       "      <td>Pen Set</td>\n",
       "      <td>42</td>\n",
       "      <td>23.95</td>\n",
       "      <td>1005.90</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>35</th>\n",
       "      <td>2019-08-24</td>\n",
       "      <td>West</td>\n",
       "      <td>Sorvino</td>\n",
       "      <td>Desk</td>\n",
       "      <td>3</td>\n",
       "      <td>275.00</td>\n",
       "      <td>825.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>36</th>\n",
       "      <td>2019-09-10</td>\n",
       "      <td>Central</td>\n",
       "      <td>Gill</td>\n",
       "      <td>Pencil</td>\n",
       "      <td>7</td>\n",
       "      <td>1.29</td>\n",
       "      <td>9.03</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>37</th>\n",
       "      <td>2019-09-27</td>\n",
       "      <td>West</td>\n",
       "      <td>Sorvino</td>\n",
       "      <td>Pen</td>\n",
       "      <td>76</td>\n",
       "      <td>1.99</td>\n",
       "      <td>151.24</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>38</th>\n",
       "      <td>2019-10-14</td>\n",
       "      <td>West</td>\n",
       "      <td>Thompson</td>\n",
       "      <td>Binder</td>\n",
       "      <td>57</td>\n",
       "      <td>19.99</td>\n",
       "      <td>1139.43</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>39</th>\n",
       "      <td>2019-10-31</td>\n",
       "      <td>Central</td>\n",
       "      <td>Andrews</td>\n",
       "      <td>Pencil</td>\n",
       "      <td>14</td>\n",
       "      <td>1.29</td>\n",
       "      <td>18.06</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>40</th>\n",
       "      <td>2019-11-17</td>\n",
       "      <td>Central</td>\n",
       "      <td>Jardine</td>\n",
       "      <td>Binder</td>\n",
       "      <td>11</td>\n",
       "      <td>4.99</td>\n",
       "      <td>54.89</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>41</th>\n",
       "      <td>2019-12-04</td>\n",
       "      <td>Central</td>\n",
       "      <td>Jardine</td>\n",
       "      <td>Binder</td>\n",
       "      <td>94</td>\n",
       "      <td>19.99</td>\n",
       "      <td>1879.06</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>42</th>\n",
       "      <td>2019-12-21</td>\n",
       "      <td>Central</td>\n",
       "      <td>Andrews</td>\n",
       "      <td>Binder</td>\n",
       "      <td>28</td>\n",
       "      <td>4.99</td>\n",
       "      <td>139.72</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    OrderDate   Region       Rep     Item  Units  Unit Cost    Total\n",
       "0  2018-01-06     East     Jones   Pencil     95       1.99   189.05\n",
       "1  2018-01-23  Central    Kivell   Binder     50      19.99   999.50\n",
       "2  2018-02-09  Central   Jardine   Pencil     36       4.99   179.64\n",
       "3  2018-02-26  Central      Gill      Pen     27      19.99   539.73\n",
       "4  2018-03-15     West   Sorvino   Pencil     56       2.99   167.44\n",
       "5  2018-04-01     East     Jones   Binder     60       4.99   299.40\n",
       "6  2018-04-18  Central   Andrews   Pencil     75       1.99   149.25\n",
       "7  2018-05-05  Central   Jardine   Pencil     90       4.99   449.10\n",
       "8  2018-05-22     West  Thompson   Pencil     32       1.99    63.68\n",
       "9  2018-06-08     East     Jones   Binder     60       8.99   539.40\n",
       "10 2018-06-25  Central    Morgan   Pencil     90       4.99   449.10\n",
       "11 2018-07-12     East    Howard   Binder     29       1.99    57.71\n",
       "12 2018-07-29     East    Parent   Binder     81      19.99  1619.19\n",
       "13 2018-08-15     East     Jones   Pencil     35       4.99   174.65\n",
       "14 2018-09-01  Central     Smith     Desk      2     125.00   250.00\n",
       "15 2018-09-18     East     Jones  Pen Set     16      15.99   255.84\n",
       "16 2018-10-05  Central    Morgan   Binder     28       8.99   251.72\n",
       "17 2018-10-22     East     Jones      Pen     64       8.99   575.36\n",
       "18 2018-11-08     East    Parent      Pen     15      19.99   299.85\n",
       "19 2018-11-25  Central    Kivell  Pen Set     96       4.99   479.04\n",
       "20 2018-12-12  Central     Smith   Pencil     67       1.29    86.43\n",
       "21 2018-12-29     East    Parent  Pen Set     74      15.99  1183.26\n",
       "22 2019-01-15  Central      Gill   Binder     46       8.99   413.54\n",
       "23 2019-02-01  Central     Smith   Binder     87      15.00  1305.00\n",
       "24 2019-02-18     East     Jones   Binder      4       4.99    19.96\n",
       "25 2019-03-07     West   Sorvino   Binder      7      19.99   139.93\n",
       "26 2019-03-24  Central   Jardine  Pen Set     50       4.99   249.50\n",
       "27 2019-04-10  Central   Andrews   Pencil     66       1.99   131.34\n",
       "28 2019-04-27     East    Howard      Pen     96       4.99   479.04\n",
       "29 2019-05-14  Central      Gill   Pencil     53       1.29    68.37\n",
       "30 2019-05-31  Central      Gill   Binder     80       8.99   719.20\n",
       "31 2019-06-17  Central    Kivell     Desk      5     125.00   625.00\n",
       "32 2019-07-04     East     Jones  Pen Set     62       4.99   309.38\n",
       "33 2019-07-21  Central    Morgan  Pen Set     55      12.49   686.95\n",
       "34 2019-08-07  Central    Kivell  Pen Set     42      23.95  1005.90\n",
       "35 2019-08-24     West   Sorvino     Desk      3     275.00   825.00\n",
       "36 2019-09-10  Central      Gill   Pencil      7       1.29     9.03\n",
       "37 2019-09-27     West   Sorvino      Pen     76       1.99   151.24\n",
       "38 2019-10-14     West  Thompson   Binder     57      19.99  1139.43\n",
       "39 2019-10-31  Central   Andrews   Pencil     14       1.29    18.06\n",
       "40 2019-11-17  Central   Jardine   Binder     11       4.99    54.89\n",
       "41 2019-12-04  Central   Jardine   Binder     94      19.99  1879.06\n",
       "42 2019-12-21  Central   Andrews   Binder     28       4.99   139.72"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.read_pickle('sales.pkl')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 6. Loading from databases\n",
    "* No matter what your database is, all you need to do is create a connection object."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "import sqlite3"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* Create connection object for sqlite3\n",
    "\n",
    "<img src=\"images/sqlite3.png\">"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "conn = sqlite3.connect('../Data/chinook.db')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "albums = pd.read_sql_query(\"select * from albums\", conn, index_col='AlbumId')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Title</th>\n",
       "      <th>ArtistId</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>AlbumId</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>For Those About To Rock We Salute You</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Balls to the Wall</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Restless and Wild</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Let There Be Rock</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Big Ones</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                         Title  ArtistId\n",
       "AlbumId                                                 \n",
       "1        For Those About To Rock We Salute You         1\n",
       "2                            Balls to the Wall         2\n",
       "3                            Restless and Wild         2\n",
       "4                            Let There Be Rock         1\n",
       "5                                     Big Ones         3"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "albums.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* Accessing other databases\n",
    "* Install & import python driver for the database\n",
    "* import MySQLdb\n",
    "* mysql_cn= MySQLdb.connect(host='myhost',\n",
    "          port=3306,user='myusername', passwd='mypassword', \n",
    "          db='information_schema')\n",
    "* df_mysql = pd.read_sql('select * from VIEWS;', con=mysql_cn)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
